skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Kasi, Harsha"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. ABSTRACT A key challenge in conducting comparative analyses across social units, such as religions, ethnicities, or cultures, is that data on these units is often encoded in distinct and incompatible formats across diverse datasets. This can involve simple differences in the variables and values used to encode these units (e.g., Roman Catholic is V130 = 1 vs. Q98A = 2 in two different datasets) or differences in the resolutions at which units are encoded (Maya vs. Kaqchikel Maya). These disparate encodings can create substantial challenges for the efficiency and transparency of data syntheses across diverse datasets. We introduce a user‐friendly set of tools to help users translate four kinds of categories (religion, ethnicity, language, and subdistrict) across multiple, external datasets. We outline the platform's key functions and current progress, as well as long‐range goals for the platform. 
    more » « less
  2. Scientists and policymakers are increasingly leveraging complex, multi-scale data from diverse, worldwide sources to understand the causes and consequences of economic development, social stratification, climate change, cultural diversity, and violent conflict. This work frequently requires integrating data across diverse datasets by complex, dynamic categories (e.g., ethnicities, languages, religions, subdistricts). However, different datasets encode corresponding categories in disparate formats and at different resolutions (e.g., Guatemala Indigenous vs. Maya vs. K’iche’). These diverse encodings must be translated across datasets before bringing them together for analysis. At global scales across thousands of categories, the combinatorial complexity creates thorny challenges for manual reconciliation and for transparent documentation and sharing of researcher decisions. There is a need to investigate direct and uncomplicated ways to support search and explore the semantics for complex and diverse datasets.We design and deploy such a tool, CatMapper, to support semantic discovery through exploration and manipulation for large, complex and diverse datasets. CatMapper enables exploring contextual information about specific categories, translating new sets of categories from existing datasets and published studies, identify and integrating novel combinations of datasets for researchers’ custom needs, including automatically generated syntax to merge datasets of interest, and publishing and sharing merging templates for public re-use and open science. CatMapper does not store observational data. Rather, it is a dynamic, interactive dictionary of keys to help users integrate observational data from diverse external datasets in disparate formats, thereby complementing and leveraging a fast-growing ecology of datasets storing observational data. We have conducted heuristic evaluation on CatMapper usability. Results shed lights on enriching semantic data discovery. 
    more » « less